Advanced R

Joe Marlo

8/17/23

Agenda

  • Basics
    • Setting up R
    • Core R skill set
  • Statistics
    • Descriptive statistics
    • Regression
    • Model evaluation
  • Into the tidyverse
  • Extras
  • Next session(s): Bayesian statistics
    • Stan
    • Multilevel models

github.com/joemarlo/irs-adv-r/

marlo.works/irs-adv-r/advanced-R.html

Today’s schedule

  • 9:30am-10:45am
  • 10:45am-11:00am
  • 11:00am-12:00pm
  • 12:00pm-12:45pm
  • 12:45pm-2:15pm
  • 2:15pm-2:45pm
  • 2:45pm-4:30pm
  • 4:30pm+
  • Session 1
  • Break
  • Session 2
  • Lunch break
  • Session 3
  • Break
  • Session 4
  • Wrap-up and Questions

About me

  • Senior Data Scientist focused on statistics, Shiny, and R programming
  • Interested in sequence analysis and causal inference. Have authored R packages on both
  • Enjoy forecasting and Shiny
  • Work in a lab at New York University focused on making machine learning methods for causal inference more accessible
  • Have worked at firms such as J.P. Morgan and Verizon in research and data science roles

The Basics

Setting up R

!=

Setting up R

You have three options:

  • Use Posit Cloud
  • Use Posit Workbench (if available at the IRS)
  • Use your local computer – laptop or desktop that has R and RStudio installed

Why R?

R is a free software environment for statistical computing and graphics
- The R Project for Statistical Computing


x <- rnorm(n = 100, mean = 10, sd = 2)
y <- x^(1/2) + rnorm(n = 100, mean = 0, sd = 0.5)
my_model <- lm(y ~ x)
plot(x, y)
lines(x = x, y = my_model$fitted)

Break

Statistics

Statistics

  • Generating data
  • Descriptive statistics
  • Regression
  • Model evaluation
  • Time series (if there’s time)

Break

Regression

  • Simple linear regression
  • The formula interface
  • Multiple regression
  • Visualizing models
  • Logistic regression
  • Count data
    • Poisson
    • Negative binomial
    • ZIP models
  • Model evaluation

Break

Into the tidyverse

The tidyverse


An opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.


See more at tidyverse.org

Basic tidyverse

  • Pipe
  • dplyr
  • broom
  • ggplot2

Extras

Reproducibility

More resources!